PDF articles metadata harvester
نویسنده
چکیده
Scientific journals are very important in recording the finding from researchers around the world. The recent media to disseminate scientific journals is PDF. On scheme to find the scientific journals over the internet is via metadata. Metadata stores information about article summary. Embedding metadata into PDF of scientific article will grant the consistency of metadata readness. Harvesting the metadata from scientific journal is very interesting field at the moment. This paper will discuss about scientific journal metadata harvesters involving XMP.
منابع مشابه
The CARL metadata harvester and search service
Purpose – To explain the background, functionality, and content of the CARL metadata harvester and search service, http://carl-abrc-oai.lib.sfu.ca/, and to outline plans for improving the service. Design/methodology/approach – This case study employs simple statistical analyses to a set of harvested metadata. Findings – This paper documents the use of unqualified Dublin Core (uDC) elements in t...
متن کاملGenre Classification in Automated Ingest and Appraisal Metadata
Metadata creation is a crucial aspect of the ingest of digital materials into digital libraries. Metadata needed to document and manage digital materials are extensive and manual creation of them expensive. The Digital Curation Centre (DCC) has undertaken research to automate this process for some classes of digital material. We have segmented the problem and this paper discusses results in gen...
متن کاملSciPDFindexer: Distributed Information Retrieval system using MapReduce
Indexing allows the conversion of raw document collections into easily searchable formats. Bigger scale indexing poses some challenges in terms of efficiently distributing indexing computation on a cluster of nodes. MapReduce framework promises to be an effective tool for parallelizing such tasks as inverted index construction. We propose SciPDFindexer, a distributed information retrieval syste...
متن کاملA Novel Parallel Architecture Design of Information Retrieval System for Scientific Papers
Indexing allows converting raw document collection into easily searchable representation. Bigger scale indexing poses some challenges such as how to distribute indexing computation efficiently on a cluster of nodes. MapReduce framework can be an effective tool for parallelizing such tasks as inverted index construction. We propose SciPDFindexer, distributed information retrieval system for scie...
متن کاملRetrieving Metadata for Your Local Scholarly Papers
We present a novel approach to retrieve metadata to scholarly papers stored locally as PDF files. A fingerprint is produced from the PDF fulltext to query an online metadata repository. The returned results are matched back to identify the correct metadata entry. These metadata can then be stored in the PDF itself, indexed for a desktop search engine, and collected in a user‟s or community‟s bi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1301.6591 شماره
صفحات -
تاریخ انتشار 2013